345 research outputs found

    Computing Covers Using Prefix Tables

    Get PDF
    An \emph{indeterminate string} x=x[1..n]x = x[1..n] on an alphabet Σ\Sigma is a sequence of nonempty subsets of Σ\Sigma; xx is said to be \emph{regular} if every subset is of size one. A proper substring uu of regular xx is said to be a \emph{cover} of xx iff for every i∈1..ni \in 1..n, an occurrence of uu in xx includes x[i]x[i]. The \emph{cover array} γ=γ[1..n]\gamma = \gamma[1..n] of xx is an integer array such that γ[i]\gamma[i] is the longest cover of x[1..i]x[1..i]. Fifteen years ago a complex, though nevertheless linear-time, algorithm was proposed to compute the cover array of regular xx based on prior computation of the border array of xx. In this paper we first describe a linear-time algorithm to compute the cover array of regular string xx based on the prefix table of xx. We then extend this result to indeterminate strings.Comment: 14 pages, 1 figur

    Inferring an Indeterminate String from a Prefix Graph

    Get PDF
    An \itbf{indeterminate string} (or, more simply, just a \itbf{string}) \s{x} = \s{x}[1..n] on an alphabet Σ\Sigma is a sequence of nonempty subsets of Σ\Sigma. We say that \s{x}[i_1] and \s{x}[i_2] \itbf{match} (written \s{x}[i_1] \match \s{x}[i_2]) if and only if \s{x}[i_1] \cap \s{x}[i_2] \ne \emptyset. A \itbf{feasible array} is an array \s{y} = \s{y}[1..n] of integers such that \s{y}[1] = n and for every i∈2..ni \in 2..n, \s{y}[i] \in 0..n\- i\+ 1. A \itbf{prefix table} of a string \s{x} is an array \s{\pi} = \s{\pi}[1..n] of integers such that, for every i∈1..ni \in 1..n, \s{\pi}[i] = j if and only if \s{x}[i..i\+ j\- 1] is the longest substring at position ii of \s{x} that matches a prefix of \s{x}. It is known from \cite{CRSW13} that every feasible array is a prefix table of some indetermintate string. A \itbf{prefix graph} \mathcal{P} = \mathcal{P}_{\s{y}} is a labelled simple graph whose structure is determined by a feasible array \s{y}. In this paper we show, given a feasible array \s{y}, how to use \mathcal{P}_{\s{y}} to construct a lexicographically least indeterminate string on a minimum alphabet whose prefix table \s{\pi} = \s{y}.Comment: 13 pages, 1 figur

    Deep network with score level fusion and inference-based transfer learning to recognize leaf blight and fruit rot diseases of eggplant

    Get PDF
    Eggplant is a popular vegetable crop. Eggplant yields can be affected by various diseases. Automatic detection and recognition of diseases is an important step toward improving crop yields. In this paper, we used a two-stream deep fusion architecture, employing CNN-SVM and CNN-Softmax pipelines, along with an inference model to infer the disease classes. A dataset of 2284 images was sourced from primary (using a consumer RGB camera) and secondary sources (the internet). The dataset contained images of nine eggplant diseases. Experimental results show that the proposed method achieved better accuracy and lower false-positive results compared to other deep learning methods (such as VGG16, Inception V3, VGG 19, MobileNet, NasNetMobile, and ResNet50)

    String Comparison in VV-Order: New Lexicographic Properties & On-line Applications

    Get PDF
    VV-order is a global order on strings related to Unique Maximal Factorization Families (UMFFs), which are themselves generalizations of Lyndon words. VV-order has recently been proposed as an alternative to lexicographical order in the computation of suffix arrays and in the suffix-sorting induced by the Burrows-Wheeler transform. Efficient VV-ordering of strings thus becomes a matter of considerable interest. In this paper we present new and surprising results on VV-order in strings, then go on to explore the algorithmic consequences

    Algorithms to Compute the Lyndon Array

    Get PDF
    We first describe three algorithms for computing the Lyndon array that have been suggested in the literature, but for which no structured exposition has been given. Two of these algorithms execute in quadratic time in the worst case, the third achieves linear time, but at the expense of prior computation of both the suffix array and the inverse suffix array of x. We then go on to describe two variants of a new algorithm that avoids prior computation of global data structures and executes in worst-case n log n time. Experimental evidence suggests that all but one of these five algorithms require only linear execution time in practice, with the two new algorithms faster by a small factor. We conjecture that there exists a fast and worst-case linear-time algorithm to compute the Lyndon array that is also elementary (making no use of global data structures such as the suffix array)

    Automatic log parser to support forensic analysis

    Get PDF
    Event log parsing is a process to split and label each field in a log entry. Existing approaches commonly use regular expressions or parsing rules to extract the fields. However, such techniques are time-consuming as a forensic investigator needs to define a new rule for each log file type. In this paper, we present a tool, namely nerlogparser, to parse the log entries automatically, where log parsing is modeled as a named entity recognition problem. We use a deep machine learning technique, specifically the bidirectional long short-term memory networks, as the underlying architecture for this purpose. Unlike existing tools, nerlogparser is a fully automatic tool as the investigators do not need to define any parsing rules and it is generic as there is only one model to parse various types of log files. Experimental results show that nerlogparser achieves superior performance compared with other traditional machine learning methods

    Graph clustering and anomaly detection of access control log for forensic purposes

    Get PDF
    Attacks on operating system access control have become a significant and increasingly common problem. This type of security threat is recorded in a forensic artifact such as an authentication log. Forensic investigators will generally examine the log to analyze such incidents. An anomaly is highly correlated to an attacker's attempts to compromise the system. In this paper, we propose a novel method to automatically detect an anomaly in the access control log of an operating system. The logs will be first preprocessed and then clustered using an improved MajorClust algorithm to get a better cluster. This technique provides parameter-free clustering so that it automatically can produce an analysis report for the forensic investigators. The clustering results will be checked for anomalies based on a score that considers some factors such as the total members in a cluster, the frequency of the events in the log file, and the inter-arrival time of a specific activity. We also provide a graph-based visualization of logs to assist the investigators with easy analysis. Experimental results compiled on an open dataset of a Linux authentication log show that the proposed method achieved the accuracy of 83.14% in the authentication log dataset

    Frame-Wise dynamic threshold based polyphonic acoustic event detection

    Get PDF
    Acoustic event detection, the determination of the acoustic event type and the localisation of the event, has been widely applied in many real-world applications. Many works adopt multi-label classification techniques to perform the polyphonic acoustic event detection with a global threshold to detect the active acoustic events. However, the global threshold has to be set manually and is highly dependent on the database being tested. To deal with this, we replaced the fixed threshold method with a frame-wise dynamic threshold approach in this paper. Two novel approaches, namely contour and regressor based dynamic threshold approaches are proposed in this work. Experimental results on the popular TUT Acoustic Scenes 2016 database of polyphonic events demonstrated the superior performance of the proposed approaches

    Consumer perceptions in the adoption of the electronic health records in Australia: A pilot study

    Get PDF
    The paper reports an empirical investigation of the factors affecting consumer perceptions of the adoption of Electronic Health Records in Australia. This paper also details the processes involved in the pilot testing of the instrument where it has been pilot-tested to a convenience sample by sending individual postal survey envelopes to shortlisted community organisations in Australia. Reliability analysis to check the internal consistency was performed using the Cronbach’s alpha. Content validity was achieved by reviewing the instrument with a panel of experts. The results of this pilot study proved the feasibility of a full-scale study and these could be used as the basis for refinement of the instrument. Based upon the outcome of validity and reliability testing, items for the final instrument were identified. The findings showed that the tested model does fit the data well and has a significant and positive impact on the consumer’s attitude in using the EHR
    • …
    corecore